Skip to content

fix(openai): 永久禁用缺失 refresh_token 且access_token过期的 OAuth 账号#2514

Open
is7Qin wants to merge 1 commit into
Wei-Shaw:mainfrom
is7Qin:fix/openai-token-missing-refresh-disable
Open

fix(openai): 永久禁用缺失 refresh_token 且access_token过期的 OAuth 账号#2514
is7Qin wants to merge 1 commit into
Wei-Shaw:mainfrom
is7Qin:fix/openai-token-missing-refresh-disable

Conversation

@is7Qin
Copy link
Copy Markdown

@is7Qin is7Qin commented May 16, 2026

Summary

  • token_provider GetAccessToken 检测到 access_token 已过期且 refresh_token 缺失时,调用 SetError
    永久禁用并清缓存;之前仅返回 error,账号在 DB 中保持 active,会被反复选中每次都在 token 阶段失败,对用户呈现持续
    502。
  • ratelimit_service HandleUpstreamError 的 OAuth 401 分支:当 refresh_token 缺失时直接 SetError 永久禁用,不再
    SetTempUnschedulable、不再改写 expires_at —— 缺 RT 的账号在 10
    分钟冷却期内无法被任何路径自愈,冷却结束只会再换来一次 502。
  • 缓存失效(tokenCacheInvalidator.InvalidateToken)和 RT 账号的原有路径(改写 expires_at、10 分钟冷却、后台
    TokenRefreshService 拾取)完全不动。

Why

线上 account_id=2881 的 OAuth 账号 expires_at 已过,credentials 中无 refresh_token。token_provider 直接 return error
但不调度降级,handler 将该 error 当作普通失败而非
UpstreamFailoverError,既不切账号也不剔除当前账号,下一发请求大概率仍选中它,循环
502。这次修复在两条独立入口同时把"缺 RT"识别为永久故障并落库到 status=error

Test plan

  • go build ./...
  • go test -tags=unit ./internal/service/ -run "RateLimit|ErrorPolicy|OAuth401|TokenRefresher|TokenProvider|RefreshAPI|OAuthRefresh"
  • 新增 TestOpenAITokenProvider_NoRefreshTokenExpired_DisablesAccount:覆盖 token_provider 兜底路径
  • 新增 TestRateLimitService_HandleUpstreamError_OAuth401NoRefreshTokenSetsError:双子用例覆盖完全无 RT / RT
    为空白字符串
  • 已审计 RT 账号链路无受影响:旧的 OAuth 401 冷却测试为 RT 路径显式补 refresh_token 字段后通过

Notes

  • SetError 而非 SetTempUnschedulable:refresh_token 缺失不是临时故障,依赖时间无法自愈,永久禁用更准确,匹配
    ratelimit_servicetoken_invalidated/token_revoked 的现有约定。
  • token_provider 内部的 SetError 使用 context.Background() 落库,避免请求 ctx 提前结束影响降级效果。

token_provider 在 expires_at 已过且 refresh_token 缺失时,仅返回 error,未做任何降级。
HandleUpstreamError 的 OAuth 401 分支也只走 10min 冷却,不区分账号是否具备刷新能力。
两条路径相加导致缺 refresh_token 的账号被反复选中、每次都在 token 阶段失败,对用户呈现持续 502。

token_provider.GetAccessToken: 命中"过期且无 refresh_token"时调用 SetError 永久禁用并清缓存,
依赖 background context 避免请求 ctx 提前结束影响落库。
ratelimit_service 401 OAuth 分支:refresh_token 为空时直接 SetError,不再写 expires_at、
不再 SetTempUnschedulable,缓存失效保留。RT 账号路径完全不动。

新增/调整测试覆盖两条路径,旧测试为 RT 路径补足 refresh_token 字段以保留原意图。
Copilot AI review requested due to automatic review settings May 16, 2026 11:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Permanently disables OAuth accounts that hit 401 or token expiry without a refresh_token, since they cannot self-heal and would otherwise be repeatedly selected, causing recurring 502s.

Changes:

  • In RateLimitService.HandleUpstreamError, OAuth 401 with missing/blank refresh_token now triggers SetError instead of a 10-minute temp-unschedulable cooldown.
  • In OpenAITokenProvider.GetAccessToken, expired access_token + missing refresh_token now calls a new helper that marks the account as errored and clears its cached token.
  • Added unit tests covering both new code paths.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
backend/internal/service/ratelimit_service.go Short-circuit OAuth 401 path when refresh_token is missing/blank, calling handleAuthError.
backend/internal/service/ratelimit_service_401_test.go Tests for missing and blank refresh_token cases; updates existing tests to include refresh_token.
backend/internal/service/openai_token_provider.go Adds disableAccountMissingRefreshToken helper invoked when token expired & refresh_token absent.
backend/internal/service/openai_token_provider_test.go Test asserting account is disabled exactly once when refresh_token missing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants